Post Processing Music Similarity Computations
نویسنده
چکیده
Today, among the best-performing algorithms for music similarity computations are algorithms based on Mel Frequency Cepstrum Coefficients (MFCCs). In these algorithms, each music track is modelled as a Gaussian Mixture Model (GMM) of MFCCs. The similarity between two tracks is computed by comparing their GMMs. As pointed out in [1, 2, 3], the distance space obtained this way has some undesirable properties. In this MIREX’06 submission, a technique has been implemented that aims to correct such anomalies to a certain extent 1 . The described algorithm ranked second (out of six) in the MIREX evaluation based on human listeners (note that the differences between the top-five ranked algorithms are not statistically significant). There is indication that it works better for artist identification than the other submitted algorithms. 1. Feature Extraction and Basic Distance Computation The basic feature extraction process is quite similar to the one in [5]. It was chosen because its good tradeoff between runtime and quality, and because algorithms based on related techniques yielded good results in MIREX’05. • The input wave files (22.050 Hz sampling rate, mono) are divided into frames of 512 samples length, with 256 samples overlap, disregarding the first and last 30 seconds. • The number of frames corresponding to 2 minutes (i.e. 20.672 frames) are used for feature extraction. In the submitted algorithm, these frames are not chosen to be consecutive. Instead, the length of the wave data is divided into 20.672 fragments of equal length, and from each of those fragments, randomly 512 consecutive samples are chosen for feature extraction. By randomly choosing the frames possible aliasing effects with respect to the track’s meter are reduced. It seems that this approach yields better results than choosing the frames in a fully random manner, or taking all frames from the two minutes in the middle of the track. • From the chosen frames, 25 MFCCs are computed. 1 For more detailed evaluations, please refer to [4] • A song is represented as the overall mean of the MFCCs, and the full covariance matrix. The feature extraction process was implemented using the MA-Toolbox ([6]). Two songs are compared by the KullbackLeiber (KL) distance. If the inverse of a song’s covariance matrix can not be found, it is assumed that it is dissimilar to all other songs. One drawback of this technique is that it does not take into consideration the temporal order of frames, thus aspects related to time are not modelled. An approach to add timedependent features is propsed in [2]. However, the version used here it is a good starting point for the post-processing step described in the next section.
منابع مشابه
SiMPle: Assessing Music Similarity Using Subsequences Joins
Most algorithms for music information retrieval are based on the analysis of the similarity between feature sets extracted from the raw audio. A common approach to assessing similarities within or between recordings is by creating similarity matrices. However, this approach requires quadratic space for each comparison and typically requires a costly post-processing of the matrix. In this work, ...
متن کاملImproving Rhythmic Similarity Computation by Beat Histogram Transformations
Rhythmic descriptors are often utilized for semantic music classification, such as genre recognition or tempo detection. Several algorithms dealing with the extraction of rhythmic information from music signals were proposed in literature. Most of them derive a so-called beat histogram by auto-correlating a representation of the temporal envelope of the music signal. To circumvent the problem o...
متن کاملPerception of Rhythmic Similarity in Reich’s Clapping Music: Factors and Models
Background An essential aspect of music is that it unfolds over time. Thus, understanding the perception and processing of the temporal organization of musical events (rhythm and metre) is critical to understanding music cognition and perception. The perception of similarity has been used as a measure of the underlying processing of categories of stimuli. There are various approaches and theori...
متن کاملRobust Singing Transcription System Using Local Homogeneity in the Harmonic Structure
Automatic music transcription from audio has long been one of the most intriguing problems and a challenge in the field of music information retrieval, because it requires a series of low-level tasks such as onset/offset detection and F0 estimation, followed by high-level post-processing for symbolic representation. In this paper, a comprehensive transcription system for monophonic singing voic...
متن کاملCategorising Folk Melodies Using Similarity Ratings
The ability to classify musical styles is an important and intriguing task from the perspective of music cognition. This process, which listeners usually do effortlessly, involves integrating a number of perceptual processes. Recent summaries on categorisation divide these into two; 1) rule application 2) similarity computations (Smith & Patalano, Jonides, 1998; Hahn & Chater, 1998). This paper...
متن کامل